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Summary. We present a nonparametric method for estimating the value and several derivatives of 
an unknown, sufficiently smooth real-valued function of real-valued arguments from a finite sample 
of points, where both the function arguments and the corresponding values are known only up 
to measurement errors having some assumed distribution and correlation structure. The method, 
Moving Taylor Bayesian Regression (MOTABAR), uses Bayesian updating to find the posterior mean 
of the coefficients of a Taylor polynomial of the function at a moving position of interest. When 
measurement errors are neglected, IVIOTABAR becomes a multivariate interpolation method. It 
contains several well-known regression and interpolation methods as special or limit cases. We 
demonstrate the performance of MOTABAR using the reconstruction of the Lorenz attractor from 
noisy observations as an example. 
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1. Introduction 

1.1. Motivation 

The basic task of regression analysis - the estimation of values of an unknown, sufficiently smooth 
function / of one or several real arguments, given a finite amount of possibly noisy data - occurs 
pervasively in many kinds of quantitative scientific research. Most existing approaches to this task 
can roughly be classified into two groups: parametric or model-based approaches such as poly- 
nomial regression, spline smoothing (Reinsch, 1967), or kriging (Krige, 1952), and nonparametric 
approaches such as local regression, nearest or natural neighbour estimation (Sibson, 1981), inverse 
distance weighting (Shepard, 1968), or kernel smoothing. Although parametric methods are usu- 
ally based on more rigorous reasoning such as maximum likelihood estimation, Bayesian updating, 
approximation theory, or some other kind of optimization, their results are only guaranteed to be 
reliable if the sought function is assumed to belong to some particular model or function class with a 
small number of parameters, e. g., polynomials or splines of a fixed degree and fixed set or number 
of break points. If, as is often the case in empirical research, this strong assumption can not be 
justified, nonparametric methods are available which, however, are usually based on various forms 
of more heuristic reasoning, frequently involve the choice of several control parameters such as the 
number of neighbours or the choice of a weight or kernel function, and sometimes do not provide 
an easily interpreted assessment of reliability such as a standard error or confidence interval. Also, 
many regression methods are not or only restrictedly applicable if one or several of the following 
conditions apply: (i) there might be measurement errors not only in the function values (the "depen- 
dent" variable) but also in the function arguments (the "independent" variables), (ii) measurement 
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Fig. 1. (Colour online) Illustration of the effect of error correlation on some MOTABAR estimates. 



errors might not be independent and identically distributed, (iii) the function is of more than one 
real argument, (iv) the measured sample is irregularly distributed in the function's argument space, 
and/or (v) also several derivatives of the function shall be estimated. 

In order to illustrate how plausible estimates can depend on the assumed amount and correlation 
of measurement errors, consider the minimal data (3, 3), (6, 6), and (9, 3) and assume that we want 
to estimate / and /' on the interval [0, 12] on the basis of this data. In the left diagram in Fig. 1, four 
different such estimates of / are shown, which were produced with the method we will describe in 
this article, using different assumptions on the measurement error contained in the data. Without 
measurement errors, i. e., if both x and y data are precise, it is plausible to estimate /(3) = 3 and 
/'(3) ~ 1, as on the blue dotted line. Without errors in x but with large independent errors in y, the 
apparent slope becomes quite uncertain, and one would rather estimate /(3) by the sample mean, 4, 
as on the yellow line. However, if the errors in y are highly correlated, e. g., because of a systematic 
but unknown bias in the measurement equipment, then one knows that the data is basically only 
shifted in the y direction, which has no influence on slopes, so one would probably expect /'(3) > 
again, as on the green dashed line. The smaller the errors, the more the slope should resemble 1, as 
on the cyan dash-dotted line. Similarly, if the y data is precise but the x data is highly uncertain, one 
would use the sample estimate if errors are uncorreiated, as on the yellow line in the right diagram 
in Fig. 1. If errors in x are correlated, one would retain some slope information which suggests that 
the values to the left and right of the three data points are rather below than above three. Hence the 
estimate would be lower than with uncorreiated errors, as on the green dashed line. The occurrence 
of this lowering effect also shows that the effect of errors in the x dimension is usually not the same 
as the effect of some "equivalent" amount of errors in the y dimension. 



1.2. Moving Taylor Bayesian Regression 

In this article, we develop a regression method which is nonparametric in the sense that it can 
estimate any sufficiently smooth function, but is still based on rigorous statistical reasoning, can 
deal with any of the above situations (i)-(v), and contains several existing methods as special or 
limiting cases. The method. Moving Taylor Bayesian Regression (MOTABAR), is based on the 
following ideas: 



(a) Approximate / locally by a Taylor polynomial at a moving position of interest ^. 
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(b) Treat the unknown Taylor coefficients as parameters in a statistical model. 

(c) Use the measured data to update prior beliefs about these parameters using Bayesian updating. 

(d) Use the posterior mean (or mode) and variance of the parameters as estimates of the function's 
value and derivatives and the corresponding estimation uncertainty. 

In principle, these steps can be performed for any form of error and prior distribution, but the method 
becomes particularly transparent if the value measurement error is assumed to be multivariate Gaus- 
sian and the involved prior distributions are also multivariate Gaussian. In that case, the resulting 
posterior distributions can be written as mixtures of Gaussians whose mean and variance can be 
derived analytically, and the resulting MOTABAR estimator /(^) of /(^) turns out to be a mixture 
of smooth rational functions of ^ that can be computed using simple linear algebra. The mixing is 
related to the distribution of argument measurement errors. Without argument measurement errors, 
/(^) is a smooth rational function of f whose algebraic form can be seen as a generalization of or- 
dinary least squares regression estimators. For these reasons, we will focus on the case of Gaussian 
priors and value errors in this article. 

The input data needed to apply MOTABAR are then 

• the measured data, 

• variances and possibly covariances of both argument and value measurement errors, 

• a prior covariance matrix for the value of / and its derivatives of order < p for some p > 0, 

• prior variances for /'s derivatives of order p, 

• and, as the only free control parameter, an integer p larger than the order of any derivative of 
/ one wants to estimate. 

For particular cases of error and prior distributions, it will turn out that the MOTABAR estimate 
equal or approximate the results of some well-known other methods, including ordinary polynomial 
regression, inverse distance weighting, and linear interpolation on regular grids. This behaviour 
of our estimator is similar to that of a related approach by Wang et al. (2010a) in which / is also 
approximated by a Taylor polynomial at a moving position of interest ^, but in which the Taylor 
coefficients are then estimated not by Bayesian updating but by minimizing a heuristic loss function 
motivated by approximation theory (see Sec. 5). Other cases of error and prior distributions result in 
plausible generalizations or variants of well-known methods, e. g., a new form of local polynomial 
interpolation and a Bayesian variant of inverse distance weighted smoothing. 

In practise, the needed prior and error variances and covariances might themselves be estimated 
from other or even the same data, a question we do however not address in detail in this article. 

Comparison with other methods. MOTABAR can be interpreted as a kind of local regression since 
although it takes into account also data points far away from ^, it gives them much less influence on 
the estimate than those close to ^. This fact is reflected in the occurrence of a weight matrix W in the 
estimator equation. But unlike other local or piece-wise methods such as nearest or natural neigh- 
bour regression, locally weighted scatterplot smoothing (LOESS) (Cleveland, 1979), or splines, the 
MOTABAR estimate /(^) is infinitely smooth (i. e., infinitely often difFerentiable), at least as long as 
the value measurement errors have nonzero variance, and there are either no argument measurement 
errors or their probability density is infinitely smooth (e. g., when argument measurement errors are 
Gaussian as well). This is because in MOTABAR the weight of each individual data point in the 
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estimate depends smoothly on its distance from ^, whereas in other methods the weights can switch 
from zero to nonzero in a nonsmooth way as ^ moves. 

The nonsmooth change in weights that other methods involve is also counter-intuitive when 
arguments can only be measured with some error. E. g., suppose that measurements resulted in 
the argument-value pairs (-1,0), (0, 1), (1,0), and (1/100,-1), where the argument measurements 
involved an error of magnitude 1/10, so that the last measurement might actually reflect /(-I/ 100) 
instead of /(l/lOO). A linear interpolant of the four measurements would have a slope of x +1, but 
when the (1/100, -1) measurement is moved towards (1/100,-1), the slope would discontinuously 
switch to a; -1. A similar effect occurs with splines. In other words, when the ranking of the 
argument measurements with respect to their distance from ^ is uncertain due to measurement errors, 
it seems inappropriate to use a method that strongly relies on the correctness of this ranking, e. g., by 
using only the k nearest neighbours of ^; although disregarding faraway measurements or extreme 
observations completely instead of just downweighting them might still be advisable to increase the 
robustness of the method if outliers might exist, e. g., due to fat-tailed error distributions. 

One existing class of methods. Inverse Distance Weighting (IDW) (Shepard, 1968), also uses 
smoothly decaying weights that only depend on the distance from ^. In a commonly used variant of 
IDW, the weights are inversely proportional to the square or a larger power of the distance, and we 
will show that this method can be derived as a special case of MOTABAR in which a noninformative 
prior distribution for (p is used. 

A common feature of many interpolation and smoothing methods, including IDW and other 
weightings- or kernel-based methods, linear interpolation, and nearest or natural neighbour inter- 
polation, is that the estimate cannot exceed the largest measured values, even if the data strongly 
suggest a nonzero slope at the largest data point. Such methods will therefore always underestimate 
the maxima of /. Other methods, like polynomial regression, can in some situations 'overshoot' 
and result in estimates that lie far outside the measured range. Spline methods are often considered 
a good compromise between these two behaviours regarding maxima. Depending on the choice of 
p, MOTABAR will behave quite similar to spline methods in this respect. As a consequence, the 
weights of individual data values in the estimate might be < or > 1 . 

Similar to other nonparametric methods or methods with many parameters, the MOTABAR 
estimate might fit the data too narrowly and show too much fluctuations due to this 'overfitting' 
when p and the prior variances are badly chosen. If the prior variances for higher derivatives grow 
too fast, the estimate is allowed to vary on smaller scales than the sampling density can resolve 
reliably in view of the assumed error distributions. The amount of overfitting can thus be controlled 
by varying p and the priors. 

1.3. Framework 

Assume that we are interested in the value of a certain function / : R'' — > R and all its derivatives 
of order < at a certain position of interest ^ e W', where p > and / is assumed to be p times 
continuously differentiable. We take the convention to write elements of R'' as column vectors and 
to enumerate their d components with a subscript index, hence ^ - (^i , . . . , ^j)' where the ' symbol 
denotes transposition. Assume that the only information we have about / is (i) measured data 
points (x"',y"') with i - I . . .N, (ii) some estimates of the magnitude of measurement errors, and 
(iii) some beliefs about the variability of / and some of its derivatives. These assumptions will be 
made more precise later We enumerate measurements with superscript indices in guillemots, hence 
x'" - (Xj", . . . , Jc^")'- Assume the /-th data point {x"\y'^') is the result of trying to measure / at the 
argument x'" , but the measurement might involve an error in both the argument and the value, so 
that the actual result y'" is the value of / at a slighly different argument x'" - - j'" (where 
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^<!> g (.jjg argument error) plus some va/we error e'" e R. In other words, 

3;<'>=/(x<'' -/'■>) + £-. (1) 

Multi-index notation. For dealing with higher-order derivatives and multidimensional Taylor poly- 
nomials of /, it is convenient to use a multi-index a e N'' consisting of normegative integers or^ > 
for = \ ...d. We then use the denotations 

a = (tti, Qr2, . . . , ccd)', |a| = ori + ar2 + • • • + OTrf e N, (2) 

or! =ai!Qr2! eN, = • • • e R, (3) 



where x & and |q;| < p. Note that the order of differentiation in {D''f){x) is unimportant for 
\a\ < p. Our quantities of interest are then the derivatives 

^<a> ^ ^D"m) (M < P). (5) 



2. Using Taylor's Theorem to get a local model of the function at the position of Interest 

For a given position of interest ^ € R'', a relationship between our quantities of interest and the 
data y'" is given by Taylor's Theorem, which leads to 

= X<"'ip'»> + r'^ + s''\ (6) 

a,\a\<p 



where 



^ (7) 



a,\a\=p 

^<-> = (Z)"/)(f + A<"'>Cir<'>-^)), A'^'eiO,!] i\a\=p). (8) 

Note that there are m - (''^^'') many choices for a with la] < p and many choices for a with 

|a| = p. It will be convenient to stack all relevant quantities into the column vectors 

X — (x\^\ . . . , x'^\ . . . , x'f', . . . , x'^^y , (9) 
r^{y\'\...,yf,...,y\''\...,yT)', (10) 

X = x-y,y = {y''\ . . . s = (e<'>, . . .,s''^J, r = . . . , r''^')', 

,p = (ifi'"' : \a\ <py, ^ = (^ir<-> ■.\a\ = p,i=l...N)', (11) 

and the matrix 

X = {T'-'" ■.i = l...N,\a\<p). (12) 

Note that the column vector has a row for each combination of a and i, and we write this row 
index as ai. This must not be corrfused with the row-and-column index pair i, a of the matrix X. 



6 Jobst Heitzig 

In other words, (p is an m x 1 vector, is {^^^Y)^ ^ ^' X is an x m matrix. Eq. 6 is now 
summarised in matrix notation as 

y^Xip + (r + e). (13) 

Although this is formally a linear regression model, we are not interested in estimating its coefficient 
matrix X (which we know already up to some measurement errors), but in estimating the regressor 
<p, and this we need to do for each position of interest ^ separately. This is the reason why we next 
apply Bayes' Theorem to Eq. 13. 



3. Using Bayesian updating to estimate the function value and derivatives 

3. 1 . General approach 

Let us model our information about the quantities of interest (p'"' = {D" f){^) and about the other 
unknown terms in Eq. 13 as Bayesian beliefs, i. e., in the form of (subjective) probability distribu- 
tions, by treating ^ and ip as random variables with Lebesgue-integrable densities q{-). Because in 
general, each if/""' corresponds to an argument ^ + A'"" that is different from ^, we assume ^ and tj/ 
are independent. Also, we assume that the function-related quantities tp are independent from the 
measurement-related quantities x and y. The measured data x,y then allow us to update our prior 
beliefs giip) about (p by applying Bayes' Theorem to Eq. 13, which leads to posterior beliefs 



Q{(p\x,y)ccQ{(p) I d'^''yQ{y\x) \ d'^r g(r\x,r)g(y\x,r,<p, r). 



(14) 



Note that we follow the general convention to use the same symbol, here g, to refer to all occur- 
ring probabiUty densities since it will always be clear from the context which variable's density is 
meant. Also, we suppress the domain of integration in the following, and we are not interested in 
multiplicative constants that do not depend on <p, and use the symbol oc to denote equality up to such 
a constant. To utilise Eq. 14, we need to specify 



• a distribution giylx) for the argument errors that might depend on the arguments. 



• a distribution g{e\x,'y) of the value errors that might depend on the arguments and argument 
errors, and 



• prior distributions g(<p) and g(i(/) expressing our initial information about the values and 
derivatives of / at ^ before the measurements. 



The term g(r\x, y) can then be determined from g(i//) using Eq. 7, while the term g(y\x, y, (p, r) can 
be determined from g{e\x,y) using Eq. 13. As an estimate of ip one can then use any measure of 
central tendency of the posterior distribution as given by Eq. 14, e. g., the posterior mean, median, or 
mode, while an estimate of the estimation error would be given by a suitable measure of dispersion, 
e. g., the posterior variance or the median distance from the median. Although this strategy can in 
principle be applied to error and prior distributions of any form, our approach becomes especially 
simple in the Gaussian case. 
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3.2. Gaussian value error and priors 

For the rest of this article, we will assume that value errors and priors are (multivariate) Gaussian, 

g{£\x,y) oc e\p{-e'Pi;e/2), (15) 
gitp) oc exp{-(^ - n^yP^{<p - fi^)/2], (16) 
g(t}f) oc exp{-(i^ - fi^yP^ifr - fi^)/2], (17) 

where we use precision matrices P. = £r' and omit to denote the dependency on x, y to simplify the 
following equations. Note that P^. is a A^ii x Nd square matrix whose rows and columns we address 
using indices of the form ik with / < and k ^ d. The case in which it is known that there are no 
value errors requires some special treatment since then Eg = so that Pg does not exist. In that case, 
Q(e\x,-y) behaves Uke a Dirac delta function, giving J d^Bg(e\x,Y)g(s) - g(0) for all integrable 
functions g of e. 

The Taylor remainders r given x and y are then also Gaussian with mean fir and covariance 
matrix given by 

i4> = J]r'^'''^4"\ t-^' = Y,^''"'i.;'^^'r^-^\ (i8) 

M=P \a\=\P\=P 

Note that for almost all choices of y and 1.^, the matrix is nonsingular. We will therefore assume 
that y has a continuous distribution and is nonsingular, so that almost surely is nonsingular. If 
it is known that 7 = 0, we assume instead that x, ^ are in general position, in which case E, is also 
nonsingular. 

To get an understanding of the relative sizes of the entries in E, , consider the special case in 
which 1.^ is block-diagonal with identical N x N blocks V, so that Y.'^'^^^' = SapV"'-'\ Then Ej.''-'* = 
y<'j>{(^<'> _ ^Yij^'j' - ^)]P I pi 4: 0, showing how the covariance of the remainders r*" and r'-'* grows 
with the p-th power of the scalar product x"^ and;\f'^\ If y is diagonal, also E^ is diagonal and 
the variance of the remainder r'" grows with the 2/?-th power of the distance between the actual 
argument of measurement and the position of interest 

Now the main step to solving Eq. 14 is to see that the posterior of (p given jc, y, and y is stiU 
Gaussian, 

g{(p\x,y, y) oc exp{-(^ - fi^yP^itp - /i^)/2), (19) 
with precision matrix P^ = P^iy) and mean /i^ - Ji^iy) given by 



P^r) ^P^+X' WX, fi^iy) = P-' [P^fi^ +X'W(y- fir)} , (20) 



where we call 

W^('Lr + EJ-' (21) 

the squared weight matrix. Note that the basic behaviour of W"'-'' is to decrease with a growing 
distance of;^^*" and;^*^* from ^, similar to the weights used in IDW. 



Singularities. Although in pathological cases, some of the involved matrices can be singular, they 
are nonsingular in general. More precisely, assume that 7 has a continuous distribution and E^ is 
nonsingular, so that E^ is nonsingular, too (see Eq. 18). Then W exist and is nonsingular for almost 
all choices of E^, including the case E^ = 0. In that case, and for generic^ and ^, X has full rank 
min(A^, m) (like a Vandermonde matrix), so that also X'WX is nonsingular when m < A^. Finally, 
under these assumptions P^ is nonsingular for almost all choices of f ^, including the case P^ = 0. 
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3.3. The Moving Taylor Bayesian Regression estimator 

Integrating over all possible values of the argument errors 7 finally shows that the posterior of <p 
given X and y i&a mixture of Gaussians whose mean and covariance matrix are 



This posterior mean, which we call the MOTABAR estimator, can now be used as a natural estimate 
of if given x, y, and one can see that this estimate is 

• an affinely linear function of the measured values y"\ with weights that decrease with growing 
distance between ^ and x"\ and 

• a mixture of values fi^piy) each of which is an infinitely smooth rational function of the posi- 
tion of interest ^ that has no poles. 

Because the entries of the inverse of an m x m square matrix A are rational functions of the entries 
of A of degree at most {m - \,m), the degree of fi^iy) is at most (2p^N, 2p^N). In practice, one can 
determine fl^iy) and Pipiy) like this, avoiding the large NxN matrix inversions and solving instead 
two systems of linear equations: 

(a) Given ^, x, and 7, compute X, fi,-, and from Eqns. 12 and 7. 

(b) Determine A = and B = W(y - fir) by solving (Lr + Se)(A, B) ^(X,y- Hr). 

(c) Compute = + X'A. 

(d) Determine /i^ by solving P^/i^ = P^t^tf 

+ X'B. 

Note that in pathological cases, either of the two systems might not be solvable uniquely, which 
possibility we do not discuss here. Although the usually smaller m x m matrix inversions needed 
to determine from in Eq. 23 are not as easily avoided, one can at least determine the posterior 
variance of /(^) = ip'^^' by solving P^c - ( 1 , 0, . . . , 0)' for c and using (P^ ' = c*"' in the integral 
in Eq. 23 to determine S^''"'. If the argument errors 7 are not known to be zero, the integrals over 7 
will usually have to be evaluated numerically even when 7 is Gaussian as well, since the integrand 
is a nonlinear function of 7. If 7 has small variance, it can however be feasible to approximate 
the integrals by using a quadratic or 4th-order approximation of /i^(7), which will be explored in a 
separate article. 

3.4. Equivariance properties 

In addition to the obvious translation invariance in all dimensions, the MOTABAR estimator is 
equivariant under various linear scaling transformations: (i) When Sg, S^, and 1,^ are all multiplied 
with the same constant C > 0, then ip remains unchanged and £^ is multiplied by C as well, (ii) 
When for aU all a, and some k, ^j., x^", and (fi'y)k are all multiplied with the same constant C > 0, 
the distribution of 7 is stretched by a factor of C along the A:-th axis, and when fj.'^\ the a-th 
row and column of E^, and the ai-th row and column of 1,^ are all divided by C"', then also ip'"' and 
the a-th row and column of are divided by C'^ for all a. 




(22) 



(23) 
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Fig. 2. (Colour online) Illustration of special and limit cases of MOTABAR interpolation for an equidistant 
sample (black dots) from Runge's function 1/(1 + 25.i;^) with one missing measurement at 0.2. Left: Spline 
interpolation compared to MOTABAR cases 4.13 (piecewise polynomial based on four nearest neighbours) 
and 4.8 (Lagrange polynomial). Right: MOTABAR cases 4.4 (local cubic polynomial regression interpola- 
tion), 4.1 1 (p = 2 with uncorrelated errors and priors), 4.12 (4-th order IDW), and a smoothing case {p = 5, 
small error, penalised intermediate derivatives), all resulting in nonpolynomial rational functions. 

4. Special and limit cases 

To better understand the effects of the various inputs to MOTABAR, let us consider a number of 
special and limit cases, some of which turn out to be equivalent to well-known existing methods, 
whereas others are presented to show limitations of our approach. Fig. 2 illustrates the diversity of 
results one can get from the same data. 

4. 1 . Vanishing argument errors 

For vanishing argument errors, q{y\x) becomes a Dirac delta function, so that 

^ = Mr = 0), = p^ir = or'/2, (24) 

hence the estimate is a pole-free rational function of ^ of degree at most (Ip^N, 2p^N). For most of 
the remainder of this section, we only derive - P^iy) and fi^ - fi^iy) which can then be plugged 
into Eqs. 22 and 23 if argument errors exist. 

4.2. Noninformative priors 

If there is no prior information about /, an improper i^-prior with — > and a centralised i/r-prior 
with //^(, — > can be used, in which case 

P^ X'WX, fi^ ^ {X'WXy^X'Wy. (25) 

In this case, fi^ is that value of ip which minimises the quadratic function 

L{ip) = \\Atp - b\\l with A = W^'^X, b = W'-'^j, (26) 

which provides an alternative way of numerical computation. Although this is formally a weighted 
least-squares estimator, it does not estimate a global set of parameters j6 as in a global linear regres- 
sion model y - X/3 + e, but is a local model in which X and W depend on ^, so that the terms in the 
estimator have to be computed for each ^ separately. 
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Consistent estimation of polynomial components. If f ^ = and fi^ - 0, we have P^^^X'WX = /. 
Hence if ^ = and J is a monomial withy'" = (;(:'")"■/«! for some a with |Qf| < then j equals the c- 
th column of X and hence //^ = P^^X'Wy equals the a-th column of /. This means that the estimate 

juf' is zero for a ^ /3 and one for a - /3, which are the correct derivatives of the given monomial 
at ^ = 0. Consequently, MOTABAR with improper priors is also equivariant under the addition of 
polynomials, e. g., a quadratic trend in a time series, a property shared with numerical differentiation 
via discrete Legendre polynomials. However, while the latter has ip = C(x,^)y with an orthogonal 
coefficient matrix C{x,^), the MOTABAR coefficient matrix P^^X'W is not orthogonal in general. 

For proper priors with i^Q and //^ - 0, the absolute value of the a-th derivative is underesti- 
mated since the prior drags the estimate towards its mean at zero, and the absolute value of the other 
derivatives is slightly overestimated (nonzero). This effect becomes smaller with increasing p. 



4.3. Vanishing value errors: interpolation 

For vanishing value errors, we have 2^ — > 0, hence W ^ Pr - 2,7' and 

P^^P^+X'P.X, fi^^ p-^'[p^fi^+X'P,.(y-Hr)}. (27) 

In the limit case of Eg - 0, this can also be derived directly from Eq. 14 by substituting r = y - X<p. 
In that case, however, can become singular when ^ — > x'" for some /, which has to be taken care 
of in the numerical solution, e. g., by assuming very small but nonzero S^, or by using the following 
exact solution for the case ^ - x'"'- In that case, i^*"* - y'" and, using the notation MQ^t),V(j) for a 
matrix M without its j\h row and A:th column and a vector v without its j\h element, 

^^,(0,0) = 2 + o)!;;^,. ,-)X(,-o), (28) 
P^fiO) = -P^ko) [Q^ + ^(;,o)(^'-,(M))"'0'(0 - y'''c - > (29) 

where Q, v are the prior precision and mean of ^(O) conditional on (/j''^' = y'", and e - (!,...,!)' is 
a vector of ones. 



4.4. Vanishing value errors with noninformative priors: 

local polynomial regression interpolation 
If in addition to Eg = we have = and //^/, - 0, we get 

P^ = X'PrX, fi^ = {X'P,-X)-^X'Pry (30) 
for general ^, while for f = x"\ we get = y"\ P^xm = ^ifi)^^{i.Oh ™d 

Ai.,(0) = (4,o)^^f.o))"'^('/,o)^0'(/) - y'e) (31) 

with V - '^rliiy ^^^^ interpolation based on local polynomial regression in which the weights 
P'r'^' decrease as a power of a quadratic form of x'" - ^ and;^'*''* - ^ (e. g., the dash-dotted red line 
in Fig. 2 right). Hence, in contrast to other local polynomial methods such as LOESS, far away 
observations get a positive though small weight. 
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4.5. Diverging errors 

At the other end of the error scale, for — > 0, we get W — > and thus — » and p.^ 
fx^. So if the (f-piioT is proper, the posterior approximates it, otherwise it diverges. The same 
behaviour obtains for a decreasingly informative i/'-prior, i. e., for — > 0. This shows that while 
a noninformative (^-prior might be chosen, the (/r-prior must be proper since it regulates the overall 
variability of the estimate. 



4.6. Penalised higher derivatives 

If the value error variance is finite (S^ 0), one might, on the other hand, assume that the derivatives 
of / of degree > p are negligible compared to the scale of e, e.g., because / is believed to be 
approximately a polynomial of degree at most p - 1. Then, taking the limit fi^ and — > 0, 
we get 

P^^P^ + X'P.X, fi^ ^ P^' [P^n^ + X'P,y) , (32) 

which is a form of Bayesian polynomial regression. Although the squared weight matrix W - P^no 
longer depends on ^ here, this is still not a global regression method since the (^-prior might depen- 
dent on ^, e. g., because one assumes some linear or nonlinear trend or some change in variability 
with ^. 



4. 7. Penalised higher derivatives with a noninformative prior: 

ordinary global polynomial regression 
A truly global regression method can be obtained by using the noninformative y?-prior with — > 
in addition to //^ ^ and I.^^O.lfN > then 

P^ ^ X'P.X, fi^ ^ {X'P,Xr'X'P,y. (33) 

In the limit, this is just ordinary global least-squares polynomial regression with polynomials of 
order p - I and possibly correlated value errors of differing magnitude. 



4.7.1. Lineal regression with argument errors vs. total least squares 

For d - I, p = 2, ifi - {a, V)' , and nonvanishing argument measurement errors, our model becomes 
the "errors in the variables" linear regression model 

^a + b{x'' -y'') + E'\ (34) 



Since this is linear in y,, the integral over y in Eq. 14 can be solved analytically. For ^ = 0, = 0, 

^ thic r^^cllltc in th^^ nr\n_r^aiiccinn nnct^^rinr 



= 0, = 0, Eg = (t\1, and Y.y - o-yl, this results in the non-Gaussian posterior 



g(a, b\x,y) ^ J d^y exp ( - WyWlllcrl - \\y - ae - bx + byWj/lcrl) 

cc (crl + /7V2)-^/2 I _ 113, _ _ bx\\l/2((rl + bW^)}. (35) 
The posterior mode has a = y - bx and 

Ncr'yb' + cT]s,yb^ + {No-la] + crls,, - crlsyy)b - crispy = 0, (36) 
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where x = Y,iX"'/N and y = Y.iy"'/N, i„ = - ■Sy.v = ^/(y'" - y)^, and i„ = - 

x)(y"' - y). If > 0, b is the largest solution of the above equation, otherwise the smallest, and it 
has the same sign as s^y. 

When the same model is estimated with the total least-squares method (aka Deming regression) 
studied already by Kummell (1879), the equation is 

cr^ySy^yb^ + (o-ls_,_, - a^ySyy)b - cr^Sxy = 0, 

instead, which is symmetric under an exchange of "dependent" and "independent" variables and 
leads to a larger absolute value of b. For cr^ <K s^y/N, this difference vanishes, and for cry — > 0, 
both solutions converge to the ordinary least-squares linear regression line with b — > Syy/Syj^. For 
CTg — > 0, total least squares gives b — » Syy/sj^y independently of o-y, while the cubic equation gives 

b |sign(s^^) ^ANcr'^Syy + s% - s^-^j /2Ncrl, 

which still depends on o-y and has b ^ sign(ivy) ^SyyjN ICy — > for a-y — > oo. This compari- 
son shows that unlike in total least squares, it is essential in MOTABAR whether we consider y a 
function of x or vice versa. 

To understand why the effects of value and argument errors are different in the linear model and 
why the cubic equation is not symmetric in x, y, consider the simple case where the measurements 
are x = y - (-2,0,2) and only the middle one, (x'^\y'^') - (0,0), has errors in both dimensions 
which are independent with y'^' , e'^' e {1,-1) with equal probability. Then the real data f(x) are 
either (-2, -1,2), (-2, -1, 2) or (-2, 1,2), (-2, 1,2), both giving slope b - 1 in linear regression, or 
(-2, -1,2), (-2, 1,2) or (-2, 1, 2), (-2, -1,2), both giving slope b ^ 0.84, so that the MOTABAR 
posterior mean is b x 0.92, whereas total least squares results in b - 1 because of the obvious 
symmetry. 



4.8. Larger order with non-informative prior: Lagrange interpolation 

For d - I, N - m - p, and = 0, we get Lagrange interpolation with /i^ - X^^y (e. g., the solid 
green line in Fig. 2 left). For d > I and = m, it depends on the placement of the;^^*" whether the 
resulting polynomial is an interpolant or not (see Gasca and Sauer (2000) for an overview). 



4.9. Minimal order with uncorrelated errors and priors: Bayesian IDW smoothing of order two 
A particularly simple case occurs for the minimal choice of p, p - 1, and when both value errors 
and priors are uncorrelated, so that and 1,^ are diagonal, and - (cr^). Then we have tp - (/(^)), 
Z = (1, . . . , 1)', and = 5ijw''\ where 



1 



^s*'^'>Orr-^^)'- (37) 



k=\ 



The latter is the squared distance between x'" and ^ as measured by the quadratic norm that is 

-•A 



weighted with the prior derivative variances 2"^'*'. The posterior distribution of /(^) given y is then 



Gaussian with mean and variance given by 



= 1 ^ 2 V — :i> ' " 1 ^ 2 V — ^- ^^^^ 
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This is a generalization of inverse distance weighting that takes into account value error via the 
occurrence of cr^" in Eq. 37 and prior information via the occurrence of fi^, cr^ in Eq. 38. We propose 
to call this method Bayesian IDW smoothing. Note that, as expected, fi^ is a rational function of ^ 
of degree at most (Ip^N, 2p^N) = (2N, 2N). 



4.10. Minimal order with vanishing errors and noninformative priors: 

ordinary IDW with squared distances 
Setting -0,P^ - 0, and = cr^I in the preceding, we get ordinary IDW with squared distances, 

where fi^ is now a rational function of ^ of degree exactly {2N - 2, 2N - 2). Note that while /i^ 
is independent from cr^, the posterior variance is proportional to cr^. In other words, our Bayesian 
approach shows that the uncertainty of the IDW estimate of order two is directly proportional to the 
scale of the derivatives of /. 



= ' V ..,...„<,> .)2), v,<. (40) 



4. 1 1. Order two in one dimension with uncorrelated errors and priors 
\f p - 2, d - 1 , //^ = 0, = 0, and Y.^ are diagonal, and - diag(fl, b), then 

a{b + ZiW<-(x''' -m + Zit' 

z-> = w'-{b + Zj w'j>(x'j' - - rii (42) 

= —. i -. (43) 

Note that fi'^' does not only depend on the distances of the x'" from ^ as encoded in w*", but also 
on the relative position of;^^"* and x'-'\ via the mixture terms in z'". See the dashed blue line in 
Fig. 2 right for an example. This dependency vanishes if we let a — » and — > oo, where we get an 
instance of the following case: 



4. 12. Uncorrelated errors, penalised intermediate derivatives, and specific priors: 
IDW smoothing 

Assume that fi^p = Q, fi^ - 0, Ee is diagonal, = diag(a,b, . . .,b), and "Z'^'-^j' = 5ap6ijv"\ If we 
let the prior for become noninformative by letting a — > 0, but let the prior for (f'"' with Iffl > 
become sharp at (f'"' = by letting b ^ oo, then 



K ^ ; with = - ; -. (44) 

In other words, the MOTABAR estimate converges to the ordinary IDW smoothing (or interpolation, 
if = 0) solution with exponent 2p. E. g., p = 2 and Eg = gives 4-th IDW interpolation (solid 



Z/wV ... </> 1 
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green line in Fig. 2 right). Note that this, however, also shows that using IDW with higher powers 
than two corresponds to implicitly assuming that some derivatives vanish which the result shows 
not to vanish after all. Hence the only plausible form of IDW is the one with squared distances. 

Inconsistent derivatives. This example also illustrates an unintuitive feature of MOTABAR: The 
estimates derivative jl'^' need not coincide with the derivative of the estimated value D"jl'^' with 
respect to ^. In IDW smoothing, the estimated slope is zero as enforced by the sharp prior, although 
the estimated value is not constant. 

4. 13. Large order with penalised intermediate derivatives: piecewise estimators 

On the upper end of the order spectrum, for certain choices of priors, one can choose very large 
orders p without running into numerical infeasibilities. A special limit case obtains with similar 
priors as above: Assume Eg = 0, a noninformative prior for the function value /(f) - ip'^' (so that 
P'^''^^ = 0), an uncorrected and centralised i^-prior with fJ-ip - and 'Z'^''^-'' = dapdijV^", and that all 
derivatives of / of order \ . . .p - 1 vanish at f (so that p.'^' - and 1.'^'"' - for |a| > 0). In the 
limit for p — » oo, then only the nearest neighbour of f is used in the estimation. This is because 
then Pr is diagonal and Pf '' IP'l'" for ; + i, so that g{(p'^'''\x,y,y) ~ exp{-P;.''"((^""* - y''')^}. 
In other words, MOTABAR then estimates /(f) by y"\ as in simple nearest neighbour 

estimation. That is, for large p and these priors, MOTABAR estimate approximates a step function 
that is constant on the Voronoi cells (aka Thiessen polyhedrons) of the sample. As the 4-th order 
IDW example in Fig. 2 (right) shows, the step-Uke shape can already be seen for small values of p. 
A step-like function with a similar smoothening at the edges also obtains for /? — > oo when argument 
errors y are not vanishing, since then /(f) — > 2, P(Vj : - f|| < - flDy"*- On the other 
hand, nonvanishing value errors, even if very small, can lead to additional levels at the means of 
more than one nearest neighbour, showing that this limit behaviour is quite unstable (see, e. g., the 
thin yellow line in Fig. 2 right). 

One can also get higher-order piecewise polynomial estimates, by assuming that only the deriva- 
tives of / of order q + 1 . . . p - 1 vanish at f for some q < p, and using an improper prior with 
P'^'°" - for \a\ < q. Then the MOTABAR estimate approximates a piecewise polynomial in- 
terpolant of degree q that uses the observations nearest to f (assuming those are in general 
position). For q = I, this leads to a piecewise linear estimate that need not, however, coincide with 
ordinary linear interpolation since, e. g., the two nearest neighbours of f e R might both lie left of f , 
leading to a discontinuity in the estimate to the right of f . Only when the positions x'" build a regu- 
lar polyhedral grid, the estimate approximates ordinary linear interpolation. For d = I, this requires 
equidistant measurement positions, and for li = 2 a regular triangular grid is required, whereas on 
a square grid, one gets a discontinuous linear interpolation with four square interpolation cells per 
grid cell, each corresponding to a different choice of three of the four corners of the grid cell. Anal- 
ogous results hold for higher-dimensional rectangular grids. For q = 3 and d = 1, the limit result is 
a piecewise cubic polynomial which is, however, not a cubic spline since although its 2nd derivative 
is continuous, its value and slope are not (e. g., the dashed blue line in Fig. 2 left). 

4. 14. Extreme arguments of interest 

If the distance between f and the data positions x'" diverges, i. e., if ||f - x"'\\2 — > we have 
X'WX — > 0. If the i^-prior is informative (P^ + 0), we then have 

P^ P^, fi^ fi^, (45) 
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that is, the data are too far away to drag the posterior significantly away from the prior. With a 
noninformative i^-prior, on the other hand, the estimate for ||^ -;t"'ll2 ^ °° will either approximate 
the sample mean (if p = 1) or will diverge to +oo (almost surely for p > 1). 



4.15. Large amounts of data: conjectured exponential rate of convergence 
If the x'" are either regularly distributed or drawn from a sufficiently smooth distribution, the dis- 
tance between ^ and the closest will decay at a rate ~ A^"'^"' for — > co. We conjecture that 
then the posterior precision grows at a rate 

P<»fi> ^ ^iH2p-\a\-V}\-i)/d ifS^ = (conjectured) (46) 

P'^''^'~N ifl^^to (conjectured) (47) 

for nonpathological error distributions and priors. The rationale for this conjecture is this: Assume 
that Eg = 0, "L^ - cr^I, and the arguments x'" ^6 sufficiently uniformly distributed over a 
i/-dimensional ball of unit radius about ^, and let \a\, \/3\ < p. Then 

and this sum is asymptotically proportional to the integral 

Jz=N-'l'i 

The conjecture implies 2;"'^' ~ l/N^^'-^P-^"^-^^-^^'''jnpanicu\iii\if''''-(D"f)(^)\ ~ a?(i/2+M-/')/''-i/2. 
In particular, for d - \, the conjectured rate of convergence of <p'°' to /(^) is NP, the same as for 
spUne interpolation. 



5. Comparison with Wang et al.'s approach 

Wang et al. (2010a) introduced a different approximation scheme that is also based on a local Tay- 
lor expansion. Adapting their terminology and notation to ours, that scheme can be described as 
defining an estimator 

/wangCf) = a'y (48) 
for /(^), where the weights a, are chosen to minimise the squared norm 

||a||^,„g = a'iXV^X' + y,)fl (49) 

subject to a number of constraints 

i:,fl"> = l, 2,a"'X<''"> = (50) 

for all a with < Iff] < ^ for some q < p, where and V,, are diagonal matrices whose entries are 
certain parameters cr'^' and error variances. 
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Although the parameters (cr'^')^ correspond to the prior variances in our approach, Wang et al.'s 
rationale is not Bayesian. Their motivation is rather that ll«ll^3„g is an estimator of the squared 
approximation error {/wang(^) - /(^))^^ and the constraints make sure that the estimator is correct 
if the true / is a polynomial of degree < q. As we see from the usage of V^p and V, instead of 2^ 
and I.y = their scheme implicitly assumes uncorrelated "priors" and errors, but one can easily 
adapt it to deal with correlations by using 

\\a\\i„,^a'(XZ^X' +i:,)a (52) 

instead of ||a||^3„g. Also, they do not consider the case of argument errors, and for these it does not 
seem obvious how to deal with them consistently in their framework. 

To compare their scheme to ours in the case of no argument errors, note that the MOTABAR 
estimate fi^ is a linear function of y only when the prior for (p is improper and the prior for ^ is 
centralised, while otherwise the estimate is only an qffine function ofy. Hence let us assume = 
and fi^ = in this section. In that case, the MOTABAR estimator itself is that (p which minimises 
the quadratic function 

L{<p) = \\A(p - b\\l with A = W^'^X, b = W^'^y. (53) 

However, while in Wang et al.'s approach a constrained problem obtains, the above is an uncon- 
strained problem. Taking the corresponding limit of /"^ — > in Wang et al.'s scheme, their target 
function becomes 

llallL,. ^ a'XY^X'a (54) 

so that their coefficients depend on the relative variances (and correlation structure) of (f but no 
longer on S, , while ours depends on the latter but not on the former 

Despite these differences, both approaches seem to have the same rate of convergence for — > 
CX3 and reduce to ordinary polynomial regression or IDW for certain choices of control parameters. 



6. Non-Gaussian cases and constraints 

Prior knowledge about (p, such as constraints of the form {D"f){x) € [a,b] for some a and a,b & 
R U {±oo), can easily be incorporated into MOTABAR estimation by choosing a suitable prior 
distribution for (p, such as a uniform distribution of ip"" on {a, b} or a Gaussian restricted to this 
interval. If one then rewrites the chosen prior in the form 

q{(P) oc go{(p) exp{-(^p - fi^yP^{(p - fi^)}, (55) 

which can be seen as a generalization of Eq. 1 6, it is easy to see that the resulting posterior simply 
involves the same factor go((p), leading to a posterior distribution of 

gitp\x,y) oc goi(p) J dYQ(-y\x) exp{-(^ - fi^)'P^{<p - fi^)] (56) 

with the same ji^ and as before. However, since the posterior mean is then no longer equal to p.^, 
one either needs to integrate Eq. 56 explicitly or use the posterior mode instead. 

In some cases, the posterior mean can be determined analytically. E. g., two-sided constraints 
of the form (D" f){x) e {a,b} with finite bounds a,^ G R can most easily be modelled by putting 
P'^''"' = and qq{>p) = 0(^"°'> - a'"')<S){b'"' - ip'"'), where is the Heaviside step function with 
©(^) = 1 for ^ > and &((p) - otherwise. For one-sided constraints of the form {D" f){x) > a*"*, 
one can use qq{<p) - @(ip"" - a'"'). 
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6.1. Function value constrained to an interval 

Using the prior factor goi(p) = ©(^*°' - d)@ib - ^*"')> the marginal posterior density of (f'^' is 
proportional to 

0(^c* - a)®(b - ifi'^') exp{-((^<0' - fif/2cr^} (57) 
with = /i*' and cr^ = (P^^ I2Y^'^\ For a density of this form, the mean is given by 

exp{-OL/ - fl)V2tr^) - exp{-(A^ - hfjlcr^] [2 

v = u + cr — — -i/-G(a, o). (58) 

erf{Ou - a) / ^/2o-} - erf{(ju - b) / ^/2o-} > ;r 

For yu outside [a,/?], this asymptotically equals b - cr(-l/z + - lO/z^ + 74/z^ - 706/z^) with 
Z = (b - fi)/cr for yL/ ^ oo (good approximation for ju > ^ + 4cr), or a + cr(-l/z + - 10/z^ + 
74/z^ - 1061 z^) with z - ajjcr for — > -oo (good approximation for yu < a - Act). With this 
constraint, the MOTABAR estimator becomes 

(p = ^ dyQ{y\x)v, (59) 

where v depends on y via yU and cr. 

6.2. One-sided constraints 
Letting — > oo in the preceding, we get 

exp{-0u-a)V2cr^} [2 

V = Li + cr -i/ - > a. (60) 

eT^{(n-a)/^/2(T} + l V ;r 

6.3. Non-Gaussian errors and quantile regression 

To see that our methodology can be fruitful also in the case of non-Gaussian ^if-priors and value 
errors, consider the case of d - 1, vanishing argument errors, an improper ^-prior, independent 
Laplace ^-priors with gii//^") oc exp(-b"'\t//'^'\) for some b'" > 0, and independent value errors with 
asymmetric Laplace distributions q(s"') oc exp{-a"*^)^<;.(e*")), where Qriz) = z{t - 0(-z)), a'" > 0, 
andr* e [0,1]. Then 

g(y\x, <p)^Y]J exp{-a<'>er..(/ " - (X^pT' - r"') - b'''\^\} 

0^ n giy''" - (X(py'\ a''\T''\ b'''/\X'''P'\}, (61) 

i 

where 

( exp{za(l-T)) _|_ exp{zo(l-T))-expfec) _^ exp(zc) jf 7 < Q 
c+a(l—T) c-a(l-r) c+ar ^ 
exp(-zaT) _|_ exp(-;aT)-exp(-zc) _^ exp(-,-c) > ' 

c+aT c-ar c+a(l-T) ^ ^ 

and the posterior mode of (p given x, 3) is the one that maximises g(y\x,(p). For a'" = a, t*" = t, 
and sharp i/r-priors with fc*" ^ 00, this reduces to minimizing the quantile regression loss function, 
EiPriy" - (XipY"}, so the above can be considered a novel form of local polynomial quantile 
regression. Similar analytical forms of g obtain also for other forms of (/^-priors, e. g., Gaussian or 
uniform ones. 
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Fig. 3. (Colour online) Typical effect of priors for data points placed uniformly at random (black dots) from 
fix) = sin(<i;x) (thin dashed black line), with at = I and p = 5. Left: A' = 6, no value error. Right: A' = 12, 
medium-sized lid value error with cr^ = 1/4. In 100 such samples, the median L2 distance between true 
and estimated function in the left[right] case was 0.068[0.24] for proper correlated i^-priors growing as 
a* oc with the correct frequency co (thick solid cyan line), 0.15[0.29] for proper uncorrelated priors 
of the same size (thick dash-dotted cyan line), 0.22[0.48] for proper correlated priors with oc (oj/lf 
(thin solid blue line), 0.25[1.7] for improper priors (thin dash-dotted red line), and 0.56[0.55] for proper 
correlated priors with a'^' oc (2ai)* (thin dotted green line). 

7. Choice of control parameters 

7. 1 . Consistent derivatives 

With a noninformative (^-prior, the inconsistency between jl'^" and the a-th derivative of /i^"' with 
respect to ^ will decrease with p and vanish for — > 00. Therefore, if one is interested in all 
derivatives of order < q, we suggest to use a noninformative prior for all if'"'' with \a\ ^ q, and use 
a p somewhat larger than q, using either informative or noninformative priors for the derivatives of 
order > q. 

7.2. Priors for oscillatory behaviour 

Correlation of derivatives. Assume that d - I and f(x) is of the form 

Jr»oo 
da> F{tj) sin(wx -1- tt^) (63) 


with a spectrum F(cl)) and phases t^) chosen uniformly at random. Then the covariance of f''"(x) 
and f'^'ix) is 

J „oo f 1 if ^ - ^ = mod 4 

•Z<k,e> ^ I dcoFicofJ^^-l ifk-{€{l,3} mod 4 (64) 
^ -'0 [ -1 if;t-^ = 2 mod 4 

In other words, the k-th and [k + 4)-th derivatives are fully correlated, and the A:-th and (k + 2)-th 
derivatives are fully anti-correlated (see also Gibson et al. (1992)). 

If the spectrum of / is known approximately, Eq. 64 the above is therefore a natural choice 
for the i^-prior covariance structure. For unknown spectrum, however, using a prior with strong 
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correlation between derivatives can result in worse results than using an uncorrelated or improper 
prior, as can be seen in the example in Fig. 3. 

Growth of derivative variance. If F{(l)) is concentrated at a single frequency ujq, i.e., f(x) = 
Fq sm((L)ox+to), the variance of f''"(x) is growing exponentially: E^*^'*' - F^oj^l'/l. For a broader but 
bounded spectrum, it is growing sub-exponentially. E. g., a uniform spectrum of the form F(cS) - Fq 
for (L) < ojQ and F(a>) - for oj > loq gives l.'^''" - FqW^*^' / (4k + 2). For typical unbounded spectra, 
in contrast, the variance growth is super-exponential. E. g., for an exponentially decaying spectrum 
with F(ol>) - Fqc^"" for some a > 0, as is often assumed of chaotic dynamical systems, we get 
J.f''' = Fl(2k - l)!fe/(2fl)2*+'. Similarly, for a Gaussian decaying spectrum with FioS) = F^e-'^'^^" 
for some v > 0, we get l.f'" = Flv''^^^^T{k + l/2)/4. In case of a power-law decaying spectrum, 
the variance even diverges for large k. E. g., for F(lo) - Fq{\ + atoy'', we get S^*^'*' - F^Tilk + 
l)r(2b -2k- l)/2fl2'^+ir(2Z?) for A: < ^7 - 1/2 and I.'^'''' ^ oo for k > b - 1/2, justifying the use of 
an improper prior 



7.3. Data-driven choice of remainder prior 

Without argument error, the posterior predictive distribution of y'" from the Taylor model about 
^ - x, has 

E(/'0 = fi'^'i^ = X,), Var(y'') = tf'\^ = x,) + (65) 

If the model is correct, one therefore expects that approximately 

4"5:*»-« = ->,) + zr 

Likewise, for any fixed ^, and if f ^ = and ju^ - 0, the posterior predictive distribution of W^^^y 
from the Taylor model about ^ has 

EiW^'^y) = W'^^Xjl^, (67) 
Cov(W^'^y) = W^'^XP^^X'W^'^ + W^'^{I.r + SJW'^^ (68) 

= w^'^xix'wxy^x'w^'^ + 1. 

If the model is correct, one therefore expects that approximately 

\\W^'\Xfi^ - y)\\l = tiacel W^'^XiX'WXy^X'W^'^ + I]^m + N. (69) 

Hence, assuming 1,^ = vS with a known matrix S and unknown v > 0, one could either use the 
same v for all ^ and choose it so that the "global" Eq. 66 is fulfilled, which is similar to the suggested 
parameter choice in Wang et al. (2010b), or use a different v for each ^ and choose it so that the 
"local" Eq. 69 is fulfilled. 



7.4. Interpolation with noninformative priors 

If Ee = and - vl, does not depend on v, and Eq. 69 becomes 

V = \\D''Hxp^ - y)\\l/(m + N), D<''> = ZaM^/^"-"'^' (70) 
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(see Eq. 18). This value is also identical to the posterior mode of v when v is treated as a scale hy- 
perparameter with a noninformative Jeffreys prior of g{v) cc l/v. Hence the MOTABAR interpolant 
with noninformative priors is 

Ji^^iX'DXy^X'Dy, Hx'DXyK (71) 

m + N 



8. Example: reconstruction of Lorenz attractor from noisy observations 

To demonstrate the performance of MOTABAR for complex data, we simulated a trajectory (X, Y, Z){t) 
of the standard chaotic Lorenz system given by the ODEs X - IQiY - X), Y - -XZ + 28X - Y, 
Z - XY - lOZ/3, where X basically oscillates between ^ +15 with a frequency of ^ w = 8 (see 
Fig. 4,a). We then generated a sample of N - 500 noisy observations X(ti) + s(t), with iid errors 
e{t) ~ N(0, 1) (which corresponds to a; 1% of noise), for time -points f, that were regularly spaced 
at a distance Af - 0.05 (which corresponds to ~ 15 data points per oscillation, see the black dots 
in Fig. 4,f). From the sample, we reconstructed the original trajectory (Fig. 4, a) up to a diffeo- 
morphism, following the approach of Packard et al. (1980) by estimating the time evolution of the 
derivatives (X,X,X) (Fig. 4,b), using different estimation methods. For MOTABAR estimation, we 
used p = 4, an improper ^-prior, and a (/r-prior whose variance cr^ - \5u)''I was chosen to fit the 
approximate range and frequency of the oscillation. Fig. 4(e) shows that the MOTABAR approach 
reproduces the shape of the trajectory much better than either spline smoothing (c) or numerical 
differentiation (d), the latter being basically equivalent to an alternative phase space reconstruction 
method, the often used "method of delays". Note that there are other, more sophisticated state space 
reconstruction methods based on PCA or discrete Legendre polynomials, which deal better with 
noise (Gibson et al., 1992) and which will be compare to MOTABAR in a separate paper. 



9. Conclusion 

Outlook: alternative local approximations. The simple form of the MOTABAR estimator is due 
to our assumption of Gaussian value measurement errors and the fact that Taylor polynomials are 
linear in their coefficients. Alternatively, one might approximate / by other functions that can be 
parameterised by the low-order derivatives of / at ^. For d - I and if it is known that / displays 
oscillatory behaviour, one such approximation could be 

f(x) - a + b sin(<jj(ji: - ^)) + c cos((y(x - ^)) + r{x - ^) (72) 

with a remainder function r with r(0) = r'(0) = r"{0) - r"'{0) = 0. Because then w - '\J-f"'(^)/f'(^), 

c = f'm"(^)/f"'{^), a = m - f'(Of"i^)/f"'ia and b = /'(f) ^PfWfWl the four deriva- 
fives can be estimated from the model if a plausible prior for r(x - f ) is used, although the estimate 
will not be a linear function in y since the above approximation is not linear in the coefficients. If 
/ is known to be periodic with frequency a>, it might seem that one could also use partial sums 
of the corresponding Fourier series instead, which are linear in their coefficients, but they are not 
parameterizable by a finite number of derivatives of / at f . 

Software. An open-source software package implementing MOTABAR for use with the python 
programming language is under development and will be made available at http : //www . pik- pot sdara . 
de/members/heitzig/motabar. 
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Fig. 4. (Colour online) Reconstruction of the Lorenz attractor from noisy measurements of its X compo- 
nent, (a) Simulated true phase space trajectory, (b) Derivatives of X from numerical differentiation from 
simulated true X data, (c) Derivatives from natural cubic smoothing spline for noisy measurements of X. 
(d) Numerical differentiation from noisy measurements, (e) Derivatives from MOTABAR with p = 4 and an 
improper yj-prior. (f) Detail of IVIOTABAR estimated X (thick blue line) compared to simulated true X (thin 
gray line), and noisy measurements (black dots). 
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